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Abstract: We perform a Bayesian analysis of current neutrino oscillation data. When 
estimating the oscillation parameters we find that the results generally agree with those of 
the method, with some differences involving S 23 and CP-violating effects. We discuss the 
additional subtleties caused by the circular nature of the CP-violating phase, and how it 
is possible to obtain correlation coefficients with When performing model comparison, 
we find that there is no significant evidence for any mass ordering, any octant of 5^3 or a 
deviation from maximal mixing, nor the presence of CP-violation 
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1 Introduction 


Neutrino oscillation experiments have now established beyond doubt that neutrinos are 
massive and there is leptonic flavour violation in their propagation [1, 2], see Ref. [3] for 
an overview. It has also been clear for more than a decade that a consistent description 
of the global data on neutrino oscillations is possible by assuming that the three known 
neutrinos (pg, t'/i, t^t) are linear quantum superposition of three massive states Vi {i = 1, 2,3) 
with masses m*. Consequently, a leptonic mixing matrix is present in the weak charged 
current interactions [4, 5] of the mass eigenstates, which can be parametrized as [6]: 


^ C12C13 S12C13 si3e“*'^cp^ 

—'S12C23 — Cl 2 Sl 3 'S 23 e*'^'^^ C12C23 — Si 2 Sl 3 'S 23 e*'^‘^'^ C13S23 

'S12S23 — Cl 2 Sl 3 C 23 e*'^°'^ —C12S23 — Sl 2 Sl 3 C 23 e*‘^‘^’^ C13C23 j 


( 1 . 1 ) 


where Cij = cos 9ij and Sij = sin0jj. If one chooses the convention where the angles 9ij 
are taken to lie in the first quadrant, 9ij e [0,7r/2], and the CP phase dcp £ [0, Stt], then 
Am^i = m 2 —ml > 0 by convention, and can be positive or negative. It is customary 

to refer to the first option as Normal Ordering (NO), and to the second one as Inverted 
Ordering (10). In the following we adopt the (arbitrary) convention of reporting results for 
Am 33 for NO and Am ‘^2 10, i.e., we always use the one which has the larger absolute 

value. Sometimes we will generically denote such quantity as Am|^, with £ = 1 for NO 
and I’ = 2 for 10. 
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Several global analyses exist in the literature [7-9], which, by fitting the results from 
the bulk of oscillation experiments, obtain best estimates and allowed ranges for these six 
oscillation parameters. Generically they obtain their results within a frequentist framework, 
using a statistics. 

Alternatively, a consistent approach to obtaining the probability that a certain param¬ 
eter within a given model takes certain values is provided by Bayesian inference. Further¬ 
more, Bayesian analysis is particularly suited for comparing how much better one model 
describes the data compared to another model. So one may question to what degree the 
current determination of the oscillation parameters is dependent on the assumed statistical 
approach, and whether Bayesian statistics can shed some light on the presently open issues 
related to the mass ordering, the octant of 023, and the presence of CP-violation. 

In this article we address these questions by performing a Bayesian analysis of the 
current neutrino oscillation data. In Sec. 2 we briefly describe the elements of Bayesian 
statistics required for this analysis. In Sec. 3 we present the global results of the analysis 
and compare them with those of the x^ analysis of the same data samples of NuFIT 2.0 [10]. 
We discuss in detail the main results related to the determination of sin^ 023 and dcp in 
Secs. 4 and 5, where we also discuss the additional subtleties caused by the circular nature 
of the CP-violating phase, and study how it is possible to define correlation coefficients 
with ^23 in Sec. 6. Finally in Sec. 7 we summarize our conclusions. 

2 Statistical framework 

In this work, we will be using Bayesian probability theory, where each proposition is associ¬ 
ated with a probability or plausibility, defined to lie between 0 and 1. In order to calculate 
the probabilities of different assumptions, hypotheses, or models, the laws of probability 
are used when conditioned on some known (or assumed) information. Of particular interest 
is Bayes’ theorem, which can be used to compare a set of hypotheses Mj, using some set 
of collected data, D, through calculation of the posterior odds, 

Pr(M*lD) _ Pr(DlMi) Pr(Mi) 

Pr(M,lD) Pr(DlM,) Pr(M,)' ^ ^ 

The prior odds Pr(Mj)/ Pr(Mj) quantifies how much more plausible one model is than the 
other a priori. The evidence, Zi = Pr(DjMj), is the likelihood for the model quantifying 
how well the model describes the data. The Bayes factor, 

B = Zi/Zj ( 2 . 2 ) 

which is the ratio of the evidences, quantifies how much better the model Mi describes the 
data than Mj. 

Given that the model M contains the free parameters 0, the evidence is given by 

Z = Pr(DlM) = j £(0)7r(0)d^0, (2.3) 

where T(0) = Pr(Dj0,M) is the likelihood function. The prior probability density of 

the parameters is given by 7r(0) = Pr(0jM), and should always be normalized, i.e., it 
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log(odds) 

odds 

Pr(Mi|D) 

Strength of evidence 

< 1.0 

<3:1 

<0.75 

Inconclusive 

1.0 

~ 3 : 1 

~ 0.75 

Weak evidence 

2.5 

~ 12 : 1 

~ 0.92 

Moderate evidence 

5.0 

~ 150 : 1 

~ 0.993 

Strong evidence 


Table 1. The Jeffreys scale, used for interpretation of Bayes factors, odds, and model probabilities. 
The posterior model probabilities for the preferred model are calculated assuming only two competing 
hypotheses and equal prior probabilities. Note that log denotes the natural logarithm. 

should integrate to unity. The assignment of priors are probably the most discussed and 
controversial part of Bayesian inference. This is often far from trivial, but nevertheless this 
assignment is an important, even essential, part of any Bayesian analysis. 

The Bayes factors, or rather the posterior odds, are interpreted or “translated” into 
ordinary language using the so-called Jeffreys scale, given in Tab. 1 as used in, e.g., 
Refs. [11, 12] (“log” denotes the natural logarithm). Even though the Bayes factor in 
general will favour the correct model once “enough” data have been obtained, the evidence 
is often highly dependent on the choice of prior on the parameters. 

In principle, the evidence defined above is really the only consistent quantity to judge 
the (relative) merit of a model. However, there are also some so-called information criteria 
which have been used to compare different models, see, e.g., [13, 14]. These do not explicitly 
depend on any prior, but typically are derived using quite restrictive assumptions. This 
makes their use less reliable, since conclusions based on them could differ much from a full 
Bayesian analysis. We will also consider the Akaike Information Criterion (AIC) (which 
is neither a Bayesian nor a frequentist meassure), motivated by minimizing the expected 
“distance” between the true data distribution, and the data distribution given by the fitted 
model. It yields a fixed penalty to each model as^ 

AIC = — 21ogTniax + 2A^par = Amin + 2A^par, (2-4:) 

dropping an irrelevant constant, and with A^par the number of free parameters. Hence, 
we see that each additional parameter needs to improve the by 2 units to make up 
for the additional complexity. Although great caution should be exercised, typically 
Z oc would be used as a proxy for the model likelihood, and hence 

—AAIC/2 between two models as log of the Bayes factor, and interpreted using Tab. 1. 
However, unlike the Bayesian evidence, it punishes complex models with additional param¬ 
eters regardless of whether these are constrained by the data, and for parameters which 
are constrained, the punishment is typically smaller than in the full Bayesian analysis. 

^The factor of 2 is just for historical reasons. There is also a modified criterion for small sample sizes, 
which we do not consider here since the number of samples is rather large. 
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Under the assumption that a model M is true, complete inference of its parameters is 
given by the posterior distribution, 


Pr(0|D,M) 


Pr(D|0,M)Pr(0|M) 

Pr(D|M) 


c{@)Ae) 

Z 


( 2 . 5 ) 


In this case, the evidence is only a normalization factor, since it is independent of the 
values of the parameters 0 and it is therefore often disregarded in parameter estimation. 
Thus the main result of Bayesian parameter inference is the posterior and its marginalized 
versions (usually in one or two dimensions). In this respect, one must distinguish between 
the marginal posterior distributions and the marginal likelihood, which is the likelihood 
integrated over all other parameters (after multiplication by the prior of these parameters). 
The former is a probability distribution, while the latter is not [15]. However, if the 
parameters of interest have a uniform prior, the marginal posterior distribution and the 
marginal likelihood are proportional to each other. For the present analysis, it is only for 
the derived parameter Jcp that the prior is sufficiently non-uniform to have a noticeable 
impact on the posterior, as we will show in Sec. 5. 

Generically in parameter inference, point estimates such as the posterior mean or 
median are given together with credible intervals (regions) for the parameters. A common 
way to define Bayesian credible intervals for a given parameter is by including all values 
with a posterior above a certain value, which however makes them non-invariant under non¬ 
linear reparametrizations. Invariance can be restored by dehning them to be iso-marginal 
likelihood intervals instead. ^ Then, one calls the “credible level” of a value ry = r/o of a 
subset of parameters simply the posterior volume within the likelihood of that value. 


CL(ryo) = [ Pr(ry|D)d7y. 

JC{rj)>C{r)o) 


( 2 . 6 ) 


This function is converted to the “number of cr’s” in the usual manner as 


5 = \/2erfc-i(l-CL). (2.7) 

In this work we use MultiNest [ 16 - 18 ], a Bayesian inference tool which, given the 
prior and the likelihood, calculates the evidence with an uncertainty estimate, and generates 
posterior samples from distributions that may contain multiple modes and pronounced 
(curving) degeneracies in high dimensions. 


2.1 Priors on oscillation parameters 

In a Bayesian analysis one has to choose a prior on model parameters, in our case the 
mixing parameters and mass-squared differences. Before considering any data, this prior 
should preferably not favour any basis or direction in flavour space, i.e., be invariant under 
rotations, or group transformations [19]. This Haar measure of neutrino mixing matrices 

^Although this only makes sense, as is the case here, with a clear separation of data and prior information, 
the latter being negligible. 
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is, after integrating out nonphysical and potential Majorana phases, the separable measure 

[ 20 ] 

■^(^12) ^13, S23) *^0?) = 1/360°, (2-8) 

in the standard parameterization. Although the prior is uniform in cf^ and not, for example, 
5^3, this is of no practical consequence since 5^3 is well-measured and significantly non-zero 
Ref. [7]. Furthermore, using other, non-invariant, priors such as uniform in the angles will 
in general not affect the results significantly. On the mass-square differences logarithmic 
priors are used. Since these are also well-measured their prior is also of no practical 
significance. 

In addition, the neutrino mass ordering can be considered as just another free pa¬ 
rameter. In this way, the two orderings can be compared, and also the inference of other 
quantities can be performed not assuming a mass ordering to be correct, but averaging 
over the two orderings. In this last case we take 7r(NO) = 7r(IO) = 0.5, and we denote this 
by mixed ordering (MO). 

Regarding the experimental nuisance parameters, they are all minimized over as in a 
analysis. Since the uncertainties of these are rather small and Gaussian, including them 
in the Monte Carlo and integrating over them instead of minimizing over them - as would 
be the correct procedure in a fully Bayesian analysis - would make a negligible difference. 

3 Posterior distributions 

First, under the assumption that three-neutrino mixing is the true model, we perform 
parameter estimation and calculate the posterior distributions of the six free parameters. 
In doing so we include the data from solar [21-30], atmospheric [31], reactor [32-46], and 
long baseline accelerator experiments [47-50], in the same data samples listed in Appendix 
of Ref. [7] and used in NuFIT 2.0 [10]. 

The results are shown in Fig. 1 for NO, Fig. 2 for 10, and Fig. 3 for MO. The posterior 
distribution for MO is simply the average of the NO and 10 posteriors, weighted by the 
posterior probabilities of the orderings, 

Pr(0|D,MO)= Pr(0|D,O)Pr(O|D). (3.1) 

0 =NO,IO 

From these figures, we conclude that the absolute values of the two mass-square dif¬ 
ferences, as well as the mixing angles, 532, and S33, are well-measured and the posteriors 
of these parameters are Gaussian to a very good approximation. 

We list in Tab. 2 different point estimates for each of these parameters: the global 
maximum likelihood (which is the best fit point, bfp, of the analysis), the point at 
which the marginal likelihood is maximal, and the posterior mean and median. The table 
also contains measures of the uncertainty of each parameter in the form of the Icr and 3 ct 
B ayesian credible intervals as well as the corresponding x^ allowed regions at the same 
CL (which we also call x^ intervals for simplicity) which are identical to those given in 
Ref. [7]. As seen in the table, for these four parameters their Bayesian point estimates 
and uncertainties are practically indistinguishable from their x^ counterparts. Thus we 
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Normal Ordering 



Point Estimates 

bfp mean median 

-i-'inarg 

Intervals 

Icr Cl 3cr Cl 

Bayes Credible Intervals 

Ict CI 3ct CI 

sin^ 012 

sin^ 013 

Am,|i 

0.304 0.304 0.305 0.305 

0.0218 0.0218 0.0218 0.0218 

7.5 7.5 7.5 7.5 

2.457 2.460 2.459 2.459 

[0.292,0.317] [0.270,0.344] 

[0.0208,0.0228] [0.0186,0.0250] 

[7.33,7.69] [7.02,8.07] 

[2.417,2.504] [2.317,2.607] 

[0.292,0.317] [0.269, 0.344] 

[0.0207,0.0228] [0.0187,0.0250] 
[7.33,7.69] [7.03,8.09] 

[2.414,2.506] [2.320,2.601] 

10-2eV2 

10-3eV2 


Inverted Ordering 


sin^ 012 

0.304 

0.305 

0.305 

0.305 

[0.292,0.317] [0.270,0.344] 

[0.292,0.317] 

[0.269,0.344] 

sin^ 013 

0.0219 0.0219 0.0220 0.0220 

[0.0209,0.0230] [0.0188,0.0251] 

[0.0209,0.0231] [0.0189,0.0252] 

Am|i 

7.5 

7.5 

7.5 

7.5 

[7.33,7.69] [7.02,8.07] 

[7.33,7.68] 

[7.02,8.09] 

10-5>eV2 

Arrig^ 

- 2.449 

2.445 

2.445 

- 2.445 

[-2.496,-2.401] [-2.590,-2.307] 

[-2.492,-2.400] 

[-2.584,-2.308] 

10-3eV2 


Table 2. Comparison of the results of and Bayesian analysis in the framework of three-flavor 
oscillations. For comparison of the determination of 023 and (5cp see Sec. 4 and 5. 

conclude that the present determination of these four parameters is very robust under 
variations of the statistical analysis and prior assumptions. 

Considering the comparison between mass orderings, we find that, assuming the same 
prior probability for both, their posterior probabilities are also very similar, the posterior 
probability of 10 in this case given by 

Pr(D|IO) _ Zip _ 

Pr(D|IO)+Pr{D|NO) Zio + Zno ' ' 

The Bayes factor (which is independent of the prior on the ordering) is: 

logs = log log (^) =-0.2, (3.3) 

i.e., there is a non-meaningful preference for inverted ordering. For comparison, the 
analysis finds Ax^ = Xmin(^O) “ Xmin(IO) — 0-97. Trivially, this gives AAIC/2 = 0.5 in 
favor of 10, which is also what log B would be if the likelihoods would have identical shapes. 
In summary, both Ay^ and the Bayesian model comparison agree that there is no evidence 
for any of the mass ordering in the present data. However one must not forget that since 
the mass ordering is not a continuous parameter, Ay^ should not have a distribution, 
and hence the quantification of the degree of favouring/disfavouring of a given ordering 
based on the corresponding Ax^ is not fully justified (see Ref. [51] for further discussion). 

Finally we notice that figures 1-3 show some differences between the results of the 
X^ and Bayesian analyses where dcp or S23 are involved. For example, we see that the 
marginalization over dcp pulls the bulk of the posterior of S23 more into the second octant. 
Motivated by these differences we present a more detailed study of the results on S23, <5cp, 
and CP-violation in the following sections. 
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—Posterior 







■■■Profile C 




Figure 1. One-dimensional posterior distributions (black full lines) and two-dimensional lu, 2a 
and 3(7 Bayesian credible regions (black void contours). The figure also shows the one-dimensional 
profile likelihoods (red dashed curves) and two-dimensional regions (coloured filled regions) from 
Ref. [7]. 

4 Determination of 

In this section we study the determination of 5^3 in more detail. To do so, in Fig. 4 we 
plot the Bayesian marginal posterior distribution (which in this case is proportional to the 
marginal likelihood) of S23 for all orderings together with the S of the credible intervals (see 
Eqs. (2.6) and (2.7)), as well as the profile likelihood and (the nominal significance 

under the assumption of a standard distribution). 
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0.25 0.3 0.35 

sin^ 6 »i2 


0.4 0.6 

sin^ 6»23 


0.02 0.025 0 

sin^ 6*13 



Figure 2. Same as Fig. 1 but for 10. 

We note that the Bayesian analysis generally prefers the second octant and it does so 
more than the analysis, in particular for NO. Although the credible and conhdence levels 
differ in the vicinity of the two peaks, both peaks are within the 2a region, and outside 
of that region the difference between the two analyses is rather small. Typically, the low- 
credibility Bayesian regions are larger than the small-y^ regions, while the high-credibility 
Bayesian regions are smaller than the large-y^ ones. This is just what is expected if the 
likelihood contains a relatively sharp peak on top of a broader plateau containing significant 
posterior probability. 

For completeness, in addition to being displayed in Fig. 4, we also give the point es¬ 
timates of ^23 in Tab. 3, namely, the global maximum likelihood, the maximum of the 
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ayes 



2 



Figure 3. Same as Fig. 1 but for MO. 

marginal likelihood, and the posterior mean and median. In Tab. 4 the measures of uncer¬ 
tainty are given in the form of the posterior standard deviation, as well as credible intervals 
corresponding to Fig. 4, and the regular intervals. 

4.1 Octants of 023 and maximal mixing 

A related question is that of which octant 023 belongs to, be., whether S 23 is larger or 
smaller than 0.5. With some similarity to the comparison of mass orderings, this is also a 
comparison of two non-nested models with the same number of parameters (although they 
are “adjacent”), and so one cannot expect difference between the minima between the 
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Figure 4. Bayesian posterior/marginal likelihood (black solid), plotted together with the profile 
likelihood (black dashed), from Ref. [7] (both normalized to their maximal value). The number 
number of cr's)(red solid), and a/( red dashed). Posterior mean (yellow line), median (green), 
and maximum of the marginal likelihood (cyan). NO (top left), 10 (top right), MO (bottom). 


Ordering 

Global max max of >Cmarg 

mean 

median 

NO 

0.452 

0.571 

0.515 

0.516 

10 

0.579 

0.576 

0.541 

0.555 

MO 

0.579 

0.576 

0.529 

0.542 


Table 3. 

Point estimates of 

*23- 



two octants to have a distribution. In a Bayesian analysis, the comparison is however 
straightforward, by simply integrating the likelihoods over each of the octants. 

In addition, one can also consider maximal mixing, 5^3 = 0.5, as a realistic model, 
either exactly or approximately. From a statistical viewpoint, a model with a fixed value of 
a parameter can also be interpreted as a model where there is some non-zero, but negligible 
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Ordering 

Method 

st. dev. 

Ifj Cl 

2f7 Cl 

3(7 Cl 

NO 

Bayes 

0.0585 

[0.433,0.496], [0.530,0.594] 

[0.424,0.505], [0.554,0.582] 

[0.415,0.613] 

[0.402,0.622] 

[0.389,0.637] 

[0.381,0.643] 

10 

Bayes 

X^ 

0.0534 

[0.514,0.612] 

[0.541,0.604] 

[0.429,0.622] 

[0.416,0.625] 

[0.400,0.640] 

[0.388,0.644] 

MO 

Bayes 

X^ 

0.0574 

[0.449,0.476], [0.516,0.607] 

[0.448,0.458], [0.541,0.604] 

[0.422,0.618] 

[0.407,0.625] 

[0.393,0.638] 

[0.385,0.644] 


Table 4. Standard deviations, credible intervals, and intervals for 822 ,- 

(compared to any experimental sensitivity) deviation from the fixed value [52], Using any 
of these viewpoints, i.e., by either considering exact maximal mixing as a possible scenario, 
or alternatively as simply a very good approximation, one can make a comparison with the 
octants. 

As always, a model with additional parameters will be punished for this extra com¬ 
plexity. In the present case, this punishment is uniquely fixed by the compactness of the 
space of the allowed values of 5^3. The Bayes factors between the second and first octants, 
as well as between non-maximal and maximal mixing, are given in Tab. 5.^ The second 




NO 

10 

MO 


logB 

0.3 

1.2 

0.7 

2nd octant vs. 1st 

AAIC/2 

-0.5 

1 

0.5 


Ax' 

-0.9 

2.0 

1.0 


logB 

-1.5 

-1.2 

-1.3 

Non-maximal vs. maximal 

AAIC/2 

-0.5 

0.0 

0.0 


Ax' 

0.9 

2.0 

2.0 


Table 5. Model comparison for different assumptions on s^g. Logarithms of Bayes factors, the 
comparable differences in the AIC, and differences in minima. The sign is chosen such that 
positive values correspond to preference for first mentioned assumptions in each case, i.e., the 2 nd 
octant and non-maximal mixing, respectively. 

octant is weakly preferred over the first for the inverted ordering, but not in the normal 
and the mixed orderings. Using the AIC, with the values also given in Tab. 5, yields the 
same conclusions, although we remind the reader that interpreting the AIC as a model 
likelihood should be done with great care. Due to the relatively bad predictivity of the 

®Ref. [53] also compares the octants and finds logS = 0.6 for all orderings for T2K data, and log I? = 
1.0 — 1.1 when also including reactor data. 
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assumption of non-maximal mixing, maximal mixing is weakly preferred over non-maximal 
in all orderings. Note that AAIC/2 can never be smaller than —1 in this case, and these 
numbers close to that limit are simply saying that for no ordering is there any preference 
for non-maximal mixing. 

If in the future the uncertainty on keeps on being reduced while maximal mixing 
continues to be allowed, at some point reducing the uncertainty further is pointless for 
the purpose of determining whether maximal-mixing is the correct model. Bayesian model 
comparison gives a quantification of at which point this is the case, which is when the 
evidence in favour of non-maximal mixing becomes strong. 

5 Exploring 5cp and CP-violation 

In this section we study the determination of dcp ia more detail. In the left panels of 
Fig. 5 we plot the Bayesian marginal posterior distribution of dcp for all orderings together 
with the S of the credible intervals, as well as the profile likelihood and For NO, 

the marginal and profile likelihoods have their maximum at about the same value of <5cp, 
but for 10 and MO, the Bayesian analysis prefers larger dcp- Comparing S with 
the difference is not that large, apart from the shift just mentioned, and the fact that S 
diverges near dcp — 90°, while y^A^ is bounded by about 2.5. 

In the right panels of Fig. 5 the marginal and profile likelihoods are plotted again, but 
in a polar coordinate system which better reflects its circular nature. We note that in a 
frequentist analysis the fact that dcp is a phase and a circular, periodic variable will affect 
distributions of test statistics [54, 55]. For the present data y^Ay^ is expected to be a 
poor approximation of the frequentist significance, and typically the true significance will 
be higher than the naive expectation. Hence, Fig. 5 does not give a direct comparison of 
frequentist and Bayesian results. 

In the Bayesian analysis, however, the circular nature of dcp does not affect the poste¬ 
rior distributions or its interpretation. Nevertheless, it still needs to be taken into account 
if one wants to make summaries of the posterior in terms of point estimates such as the 
mean, median, or measures of dispersion such as the standard deviation. This is because 
the normal, linear definitions of these quantities will depend on the arbitrary choice of 
origin for dcp [56-58]. 

In this respect a useful summary of the distribution of dcp is given by the first moment, 

mi = ( 5 . 1 ) 

with (•) denoting the mean (indeed, it is which enters the mixing matrix). The 

appropriate analogues of the mean and median of dcp are the circular mean and circular 
median. The first one is given by the argument of the first moment, 

(5cp = arg 772-1 = arg(e*'^'^r)^ (5.2) 

while the second is defined as the endpoint closer to mean of the diameter of the circle that 
has 0.5 probability on each of its sides. These point estimates are summarized in Tab. 6 
together with the likelihood maxima, and their values are plotted in Fig. 5. 


- 12 - 







Figure 5. Left plots: same as Fig. 4 for Jcp- Right plots: Same as left plots, but with only 
posterior and profile likelihood and plotted in polar coordinates. For clarity, half of the maximal 
radius corresponds to zero function value. 


In what respects characterization of the dispersion, besides the credible intervals, if one 
wants to have a characterization similar to that provided by the linear standard deviation, 
one can make use of the fact that R = |mi| gives a reasonable measure of dispersion. 
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with R = 0 for a uniform distribution and R = 1 for a degenerate one. However, it could 
be preferable and more easily interpretable to have such a measure which is an expected 
deviation in radians. Noting that the standard linear variance is the expectation of the 
Euclidean distance squared from the mean, in general one could use 

V = (d^(<5cp,<5cp)) (5.3) 

to obtain a dispersion, where d is some metric on the circle. The usual linear metric 
d{a, /3) = |a — /3| is not invariant with respect to choice of origin, but one can take instead 
d as the minimum arc length between a and /3, also called the great-circle distance. Hence, 
one can simply take a = as the variance. 

Another metric one can use is the one inherited from the Euclidean embedding, 

d'{a,l3)‘^ = |e^“ — = (sin a — sin/3)^ + (cos a — cos/3)^ = 2(1 — cos(a — /3)). (5.4) 

Then, the variance becomes 

V = {d'{6cp,Scp)'^) = (2(1 - cos((5cp - ^cp))) = 2(1 - R), (5.5) 

since R = (cos((5cp — ^cp))- To get the equivalent deviation as an angle away from the 
mean, we solve V = 2(1 — cosci^), giving simply 

u'= arccos 12, (5-6) 

which is then the deviation from the mean which has the same distance squared as the 
expectation over the distribution. These measures of dispersion, together with the corre¬ 
sponding credible intervals, are show in Tab. 7. 


Ordering 

Global max [°] 

Max of Tmarg [°] 

mean [°] 

median [°] 

NO 

306 

304 

289 

286 

10 

254 

273 

262 

262 

MO 

254 

289 

271 

272 


Table 6. Point estimates of 6cp. The mean and median are the corresponding circular quantities. 


The presence of CP violation can also be studied in terms of the Jarlskog invariant, 
Jcp, which, in the standard parameterization, is given by 

Jcp = Tcp^sin(5cp = ci2Si2C23'S23Ci3Si 3 siniJcp. (5.7) 

We plot in Eig. 6 the Bayesian marginal posterior distribution of Jcp and for all 

orderings together with the S of the credible intervals, as well as the profile likelihood and 
We note that these are derived parameters, and so their priors and posteriors are 
determined by those of the free oscillation parameters. In particular, their priors are not 
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Ordering Method cr/cT'[°] Icr Cl [°] 2a Cl [°] Su Cl [°] 


NO 

Bayes 



10 

Bayes 


X^ 

MO 

Bayes 


65/58 

[223, 350] 

- 

[234, 346] 

56/51 

[207,319] 

- 

[192,317] 

61/55 

[211,333] 

- 

[192,317] 


[42,139]° 

[84,94] 

[33,131]° 

- 

[9,146]° 

[70,90] 

[8,142]° 

- 

[28,144]° 

[76,90] 

[16,142]° 

- 


Table 7. Measures of dispersion and credible intervals. Here, H is the complement of I, i.e., all 
values of i5cp not contained in I. 

exactly uniform. For Jqp^ (the left panels) the prior is very close to uniform, and from the 
figure we see that it is so well constrained that it is perfectly Gaussian and agrees with the 
profile likelihood. 

For Jcp (right panels of Fig. 6), we plot both the posterior and the marginal likelihood, 
and we observe a difference, although it is not very large. A much larger difference is 
observed between these and the profile likelihood, which translates into a difference in the 
corresponding CL’s {S and However, this difference is much smaller than one could 

naively expect form the differences in posterior versus the profile likelihood, the reason for 
this being that the Bayesian results are a function of the total probability contained in a 
region, and the sharp peak in the posterior still contains relatively little probability. 

That the posterior of Jcp shows peaks towards the edges of the distribution is simply 
because the density of | sin(5cp| is larger for those values. This is not canceled out in the 
marginal likelihood because Jqp^ has a broad prior, which means that so has Jcp- Of 
course, the symmetry around JcP = 0 is broken by the information on 5cp supplied by 
the data, which then means that negative values of Jcp are preferred, and more strongly 
so than in the analysis. Note that since we do not have any freedom left in choosing 
our priors on the oscillation angles and phase, this is in some sense a robust consequence 
of using consistent Bayesian inference. 

5.1 CP-violation vs CP-conservation 

In the same way as maximal mixing, one can consider either exact CP-conservation as a 
possible scenario, or alternatively simply CP-conservation as a very good approximation, 
and compare the models: 

• Mqpq-. (5 = 0 

• Mgpc: (5 = 180° 

• Mcpc : Mqpq or Mqpq (with equal priors) 

• Mcpv: £ [0°,360°] \ {0°, 180°}, with prior 7r((5cp) = 1/360°. 
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^CP 



^CP 



^CP 


Figure 6. Jarlskog invariant and its maximal value for all orderings. NO (top), 10 (middle), MO 
(bottom). 


Note that these assumptions on CPC and CPV are unambiguously defined in the 
sense that they do not depend on a parameterization, and that the prior on (^cp in Mqpy 
is uniquely given by the Haar measure. Hence, there is essentially no flexibility remaining 
in the choice of prior. Due to this fact and the compact nature of the parameter space, the 
normal pitfalls of model comparison, i.e., the potentially large and prior dependent penalty 
acquired for additional parametric complexity, are avoided, or at least heavily mitigated. 

This unusually robust (fixed in size) and small penalty for the additional parameter 
means that the Bayesian analysis is expected to be more powerful at detecting CPV than it 
normally is at detecting a new physical effect. Hence, when comparing with a analysis, 
a smaller significance or value of Ax^ than normally should be needed for robust, Bayesian, 
detection of CPV. Equivalently, a certain value of would lead to a stronger Bayesian 
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evidence of CPV than what the same Ax^ would yield in a different setting. 

Interestingly, also the true frequentist significance of CP-violation is expected to be 
stronger than the naive expectation [55], although the details depend significantly on the 
(unknown) value of S 23 assumed to be true This does not happen in a Bayesian analysis, 
which also does not depend on any distributions of test statistics under repeated experi¬ 
ments, but only on likelihood of the data which was actually observed. 

The likelihoods of the different assumptions on dcp, in the usual form of logarithms of 
Bayes factors, log(Z/Zcpv) relative to Mcpv are shown in Tab. 8 , together with the AIC 
and difference in Although technically CP-violation is preferred in all cases, in none of 
the cases is the evidence even weak. Notice also that since 5cp is relatively unconstrained, 
the preference for CPV is even smaller using the AIC than in the Bayesian analysis. 




NO 

10 

MO 


log 13 

- 0.1 

- 0.8 

-0.4 

M^PC 

AAIC/2 

0.1 

-0.7 

-0.4 


Ax' 

- 1.8 

-3.4 

- 2.8 


log 13 

-0.4 

- 0.1 

- 0.2 

M^PC 

AAIC/2 

0.1 

0.3 

0.3 


Ax' 

- 1.8 

-1.5 

-1.5 


log 13 

- 0.2 

-0.4 

-0.3 

Mcpc 

AAIC/2 

0.1 

0.3 

0.3 


Ax' 

- 1.8 

-1.5 

-1.5 


Table 8. Model comparison for different assumptions on dcp- Logarithms of Bayes factors relative 
to Mcpy, the comparable differences in the AIC, and differences in ols = Xmcpv ~ 

For all variables, positive values would indicate preference of the corresponding over Mcpv- 


6 Correlation between and 5 cp 

In this section we discuss the possible quantification of the correlation between sin^ @23 
and (5cp- The posterior in the s^g — (icp plane for all the orderings is plotted in Fig. 7, 
together with the credible regions and x^ contours. Although the difference between the 
Bayesian and x^ analysis does not appear to be extremely large, there are some things 
which a Bayesian analysis makes possible which cannot be done in a x^ analysis. In 
particular, as seen in the figure, it is clear that s|g and 5cp are not independent, and it will 
be interesting to quantify if the degeneracy between them is something which persists in 
future experiments. In a x^ analysis, quantifying the “correlation” between two parameters 

^ This is the case particularly in analysis of the current data, where sensitivity to 5cp is poor. However 
for more sensitive data the behaviour is expected to become more Gaussian [51]. 
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is typically limited to fitting a two dimensional Gaussian at the best-fit point. In a Bayesian 
analysis, global measures of association such as the standard Pearson product-moment 
correlation coefficient are available. However, this one only measures linear association, 
and is hence less useful when there are non-linear trends involved, including multi-modality. 
In particular it is possible for two highly dependent variables to have very small value of 
the Pearson correlation. Furthermore, in the present case, it fails in an even worse manner 
since the Pearson correlation is not circular invariant, i.e., its value depends on the arbitrary 
choice of origin for 5cp- In what respects 623 one can treat 623 as circular variable or use 
instead the linear variable 5 ^ 3 . 



Figure 7. Posterior in the S 23 ~ <^CP plane (blue shading), Icr, 2a, 3a credible regions (black) 
contours (red dashed). NO (top left), 10 (top right), MO (bottom). 

So let us focus on how to dehne a correlation coefficient which can overcome these lim¬ 
itations. Typically a correlation coefficient will aim to quantify how much of the variation 
in one variable can be explained by the variation in another one. For example, to what 
extent the linear relation (Y\X = x) = ax + b is responsible for the variation in Y (which 
leads to the standard Pearson correlation coefficient). Similarly one can consider circular- 
circular associations between two circular variables 0 and <I> (in this case dcp and O 23 ), 
circular-linear association, predicting the expectation of 0, given X = x , or linear-circular 
association, predicting the expectation of X, given 0 = 0 (in these cases X = sin^ 023)- 

Many measures of correlation involving circular variables already exist in the literature 
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(see [56-59]). For two circular variables a simple one is 

■sj (sin^(0 — 0))(sin^(<l> — <!>)) 

where the bar denotes the circular mean. This has many properties in common with the 
linear version, such as being confined to the interval [—1,1], it is zero if the variables are 
independent, and it numerically agrees with the linear version for concentrated distribu¬ 
tions. 

An alternative, but slightly more complex, correlation coefficient for two circular vari¬ 
ables is the T-linear one of Ref. [60], 


(sin(0i - © 2 ) sin(<l>i - $ 2 )) 

\/ (sin2(0i - 02))(sin2($i - $ 2 )) ’ 


where 0i and ©2 are treated as two independent copies of 0, and similarly for <h. 

Also for linear-circular association, one can split the circular variable into its sine and 
cosine and consider the multiple correlation coefficient between X and (sin 0, cos 0), giving 


2 _ Pxc + Pxs “^PxsPxcPcs 
P\c — 


I - 

rc 


(6.3) 


with Pxc = p{x,cosy), pxs = pix,smy), pcs = p{cosy,smy) being standard linear coeffi¬ 
cients. We notice that being dehned by a square, only \p\c\ is known and hence gives no 
information on the “sign” or “direction” of the association. 

While the above measures of association overcome the problem of the circular invari¬ 
ance they are still only sensitive to a limited kind of association, and it is possible for them 
to be zero even when the variables are highly dependent on each other. It could hence be 
of interest to have a measure which can quantify any type of dependence, and which will 
only be zero when the variables are independent. Such a measure, based on information 
theory, is the mutual information [61-64]. This is information gained by knowing the full 
distribution P{x,y) rather than only the marginal distributions P{x), and P{y), or equiv¬ 
alently, the average information gained on X by knowing the value of Y (and vice versa). 
This can be expressed as the so-called Kullback-Liebler divergence between Px,y and the 
product PxPy, 

Hx, ^) = y y) log p{x)P{y) 

Using the natural logarithm gives the result in nats, while one obtains the results in bits 
by using base 2. It holds that I{X,Y) > 0 with equality if and only if X and Y are 
independent. Next, in order to make the connection with the standard correlation coeffi¬ 
cient, we note that for a two-dimensional Gaussian distribution (for which no correlation 
is equivalent to independence), I = log(l/Y^l — p^), and so we define 


pj = l- 


.-2/ 


(6.5) 


We now have constructed a correlation coefficient which is independent of any boundary 
conditions on the variables and is invariant under arbitrary univariate redefinitions of x and 
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y (which the others are not). As the previous coefficients it also reduces to the standard 
Pearson coefficient in the limit of a concentrated Gaussian distribution. However, like |/5ci|; 
it only measures the degree of dependence, but not any “direction” of the association. 

Our estimates of the different correlation coefficients are given in Tab. 9.^ For all 
measures, we find stronger correlation in NO than in 10, typically significantly so (with the 
exception of pt)- Furthermore, the two signed circular-circular measures have significantly 
smaller absolute values than the others, and for these we also hnd that in MO the correlation 
is actually larger than both NO and 10, which is not the case for the others. We note that 
all are smaller than or equal in size of \pi\. This is somehow expected as \pi\, in some 
sense, measures “all” the dependence between (5cp and 5^3. 



NO 

10 

MO 

Pcc 

-0.20 

-0.15 

-0.21 

Pt 

-0.14 

-0.13 

-0.16 

\pcl\ 

0.27 

0.16 

0.23 

\pl\ 

0.30 

0.18 

0.26 


Table 9. Different correlation coefficients between S 23 and ^cp- 


7 Summary 

We have presented the results of a Bayesian global analysis of solar, atmospheric, reactor 
and accelerator neutrino data in the framework of three-neutrino oscillations and com¬ 
pared them with those from the standard analysis in NuFIT 2.0 [10]. The results are 
summarized Fig. 1 for NO, Fig. 2 for 10, and Fig. 3 for MO where we compare the rele¬ 
vant Bayesian quantities (the posterior distribution and two-dimensional Bayesian credible 
regions) with the profile-likelihood and the two dimensional allowed regions. 

We found that the four parameters Am|^, Am^i, S 12 , and are well-measured and 

their posterior distributions are Gaussian to a very good approximation. The corresponding 
Bayesian credibility intervals at a given CL are also very similar to the x^ allowed regions 
at the same CL, as seen in Table 2. 

We found some differences between the results of the x^ Bayesian analysis where 
(5cp or 5^3 are involved. In particular, the marginalization over dcp pulls the bulk of 
the posterior of S23 more into the second octant which has some effect on the ranges of 
parameter estimates and the quality of the description between octants. We study the 
determination of 823 in more detail in Sec. 4 and we conclude that the Bayesian analysis 
generally prefer the second octant more so than the x^ analysis, in particular for NO. The 
credible and conhdence levels differ in the vicinity of the two peaks but both peaks are 

®We note that large biases in the estimation of the mutual information may occur [62, 63]. As before 
we use kernel density estimate of the densities, similar to Ref. [64] , and our very large sample sizes ensures 
an accurate estimate. 
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within the 2cj regions. Altogether the low-credibility Bayesian regions are larger than the 
small-x^ regions, while the high-credibility Bayesian regions are smaller than the large-x^ 
ones. 

In what respects the present determination of dcPj presented in Sec. 5, we found that 
for NO, the marginal and profile likelihoods have their maximum at about the same value 
of (5cp, but for 10 and MO, the Bayesian analysis prefers slightly larger values of dcp- Also, 
unlike the interval, the 3a Bayesian credible interval do not contain the full range of 5cp 
but some values near tt/2 are not included. We have also introduced and quantified two 
measures of the dispersion of 5cp equivalent to the linear standard deviations but valid for 
a circular variable. 

In addition, we have studied the Jarlskog invariant, JcP; as well as its maximal value 
over (5cp and find that the posterior distribution of is perfectly Gaussian and agrees 
with the profile likelihood. For Jcp large differences appear between the posterior dis¬ 
tribution and the profile likelihood and lead to some difference in the corresponding CL 
intervals. In particular we find that negative values of Jqp are preferred in both analysis 
but more strongly in the Bayesian than in the x^ analysis. 

The possible quantification of the correlation between 623 and dcp taking into account 
their circular nature has been discussed in Sec. 6. In particular, we have introduced a new 
correlation coefficient, pi, defined in terms of the mutual information, which is independent 
of any boundary conditions on the variables and is invariant under arbitrary univariate 
redefinitions of them. Quantitatively we always find stronger correlation between dcp ^md 
023 in NO than in 10. 

Finally, we note that a Bayesian analysis is particularly suited for comparing how much 
better one model describes the data compared to another model, a comparison which is 
quantified in terms of the Bayes factor of the two models (assuming both models to be 
equally probable a priori). We have applied this to the comparison between the mass 
orderings, the octant of 623 , and to the presence of CP violation with the following conclu¬ 
sions; 


• In what regards the comparison between both orderings, we find that, assuming the 
same prior probability for both, their posterior probabilities are also very similar: 
0.55 for 10 and 0.45 for NO with a logarithm of Bayes factor of —0.2, which implies 
that slight preference for inverted ordering is not statistically meaningful. 

• Applied to the preference for the octant of 623 we find that the second octant is 
weakly preferred over the first for the inverted ordering, but not in the normal nor in 
the case of no assumption or knowledge on the ordering. Also due to the relatively 
bad predictivity of the assumption of non-maximal mixing, maximal mixing is weakly 
preferred over non-maximal in all orderings. 

• As for CP violation we find that although technically CP-violation is preferred over 
CP conservation (either for dcp = 0 or dcp = tt), the corresponding value of the 
logarithm of the Bayes factor is always smaller than 1 in absolute value, i.e., the 
corresponding evidence is not even weak. 
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